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University of Minnesota 

We consider the spectral properties of a class of regularized esti- 
mators of (large) empirical covariance matrices corresponding to sta- 
tionary (but not necessarily Gaussian) sequences, obtained by band- 
ing. We prove a law of large numbers (similar to that proved in the 
Gaussian case by Bickel and Levina), which implies that the spectrum 
of a banded empirical covariance matrix is an efficient estimator. Our 
main result is a central limit theorem in the same regime, which to 
our knowledge is new, even in the Gaussian setup. 

1. Introduction. We consider in this paper the spectral properties of a 
class of regularized estimators of (large) covariance matrices. More precisely, 
let X = X^ be a data matrix of n independent rows, with each row being 
a sample of length p from a mean zero stationary sequence {Zj} whose 
covariance sequence satisfies appropriate regularity conditions (for details 
on those, see Assumption 2.2). Let X T X denote the empirical covariance 
matrix associated with the data. We recall that such empirical matrices, as 
well as their centered versions (X — X) T (X — X) , where Xy = n~ l J2k=i Xkj > 
are often used as estimators of the covariance matrix of the sequence {Zj}, 
see [2] . We remark that the information contained in the eigenvalues of the 
covariance matrix is often of interest, for example, in principal component 
analysis or applications in signal processing. 

In the situation where both p and n tend to infinity, it is a standard conse- 
quence of random matrix theory that these estimators may not be consistent. 
To address this issue, modifications have been proposed (see [3]) to which 
we refer for motivation, background and further references. Following the 
approach of [3], we consider regularization by banding, that is, by replacing 
those entries of X T X that are at distance exceeding b = b(p) away from the 
diagonal by 0. Let Y = denote the thus regularized empirical matrix. 
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We focus on the empirical measure of the eigenvalues of the matrix Y. In 
the situation where n — > oo, p — > oo, b — > oo and b/n — > with b < p, we give 
in Theorem 2.3 a law of large numbers (showing that the empirical measure 
can be used to construct an efficient estimator of averages across frequency 
of powers of the spectral density of the stationary sequence {Zj}), and in 
Theorem 2.4, we provide a central limit theorem for traces of powers of Y. 
We defer to Section 9 comments on possible extensions of our approach, as 
well as on its limitations. We note that in the particular case of Gaussian 
data matrices with explicit decay rate of the covariance sequence, and further 
assuming b ~ (y/n/logp) a for some constant a > 0, the law of large numbers 
is contained (among many other things) in [3], Theorem 1. But even in that 
case, to our knowledge, our central limit theorem (Theorem 2.4) is new. 

2. The model and the main results. Throughout, let p be a positive 
integer, let b = b(p) and n = n(p) be positive numbers depending on p, with 
n an integer. (Many objects considered below depend on p, but we tend to 
suppress explicit reference to p in the notation.) We assume the following 
concerning these numbers: 

Assumption 2.1. As p — ► oo, we have b — > oo, n — > oo and b/n — > 0, with 
b <p. 

For any sequence of random variables U±, . . . ,U n , we let C(Lq, . . . , U n ) 
denote their joint cumulant. (See Section 4 below for the definition of joint 
cumulants and a review of their properties.) Let 



be a stationary sequence of real random variables, satisfying the following 
conditions: 

Assumption 2.2. 



oo 



(1) 

(2) 
(3) 



E(\Z \ k ) <oo for all k > 1, 



EZ = 



^■■■^2\ c ( z o,Zj 1: ... 



,Z jr )\ < oo 



for all r > 1. 



We refer to (3) as joint cumulant summability. In Section 2.4 below we 
describe a class of examples of sequences satisfying Assumption 2.2. 
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2.1. Random matrices. Let 



(i)-ioo too 



1, if|i-j|<6, 
0, if > b. 



W^j Sj=-ooH=l 

be an i.i.d. family of copies of {ZjYjL^. Let X = be the n-by-p random 
matrix with entries 

x(i,j) = x lJ = zf/V^. 

Let B = be the p-by-p deterministic matrix with entries 
B(i,j)=B ij = ( 

Let Y = Y^ be the p-by-p random symmetric matrix with entries 

(4) Y(i,j)=Y ij = B ij (X T X) ij 

and eigenvalues {A^}f =1 . Let 

(5) L = LW=p- 1 J2$ A (ri 

i=i 

be the empirical measure of the eigenvalues of Y. Our attention will be 
focused on the limiting behavior of L as p — > oo. 

2.2. The measure vz- For integers j let 

(6) R(j) = Cov(Z ,Z j ). 

Since C(Zq,Zj) = Cov (Zq, Zj), a consequence of (3) is the existence of the 
spectral density fz '■ [0, 1] — ► M associated with the sequence {^ }, defined to 
be the Fourier transform 

f z (9) = J2e 2m ' e RU)- 

By the Szego limit theorem [4], the empirical measure of the eigenvalues 
of the matrix R(\i — j|)^ =1 converges to the measure vz '■= m o f^ 1 on K, 
where m denotes Lebesgue measure on [0,1]. (Note that, considering the 
spectral density fz as a random variable on the measure space ([0, l],m), 
one can interpret vz as its law.) It is immediate to check from the definition 
that all moments of vz are finite and are given by 

/ x k v z (dx)= f 1 f z (6) k d9 = R*R*---*R(0) 

JR JO s ' 

= 22 Cov(Zo,Z il )---Cov(Z ,Z ifc ), 

tlH Mfc=0 
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where * denotes convolution: 

(F*G)(j) = J2FU-k)G(k), 

for any two summable functions F, G :Z ^ M.. Note that (7) could just as 
well serve as the definition of vz- 

2.3. The coefficients Qij and r\ . With notation as in (3), (6), (7), for 
integers m > and all integers i and j, we write 

Qij = ^2 C(Zi, Z , Zj + £, Zg), 

(8) 

R!- m) = R*---*R (i), Rr=S i0 . 
in 

By (3) the array Qij is well defined and summable: 

(9) J2\Qij\<°°- 

i,j<=Z 

The array Qij is also symmetric: 

(10) Qij = ^2 C(Zi^g, Zj, Zq) = Qji, 

by stationarity of {Zj} and symmetry of C(-, -,-,•) under exchange of its 
arguments. 

The following are the main results of this paper. 

Theorem 2.3 (Law of large numbers). Let Assumptions 2.1 and 2.2 
hold. Let L = L^ be as in (5). Let vz be as in (7). Then: L converges 
weakly to vz, in probability. 

In other words, Theorem 2.3 implies that L is a consistent estimator of 
vz, in the sense of weak convergence. 



Theorem 2.4 (Central limit theorem). Let Assumptions 2.1 and 2.2 



hold. Let Y = be as in (4) . Let Qij and i?-"^ be as in (8) . Then the 



process 



I . /^(trace Y k - E trace Y k )\ 

I V p ) k=i 



converges in distribution as p^ oo to a zero mean Gaussian process {Gk}^ = i 
with covariance specified by the formula 

(ii) Ti BGkGe = 2R ° +e) + ? Rt^QijRf^- 
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Note that the "correction" Qij vanishes identically if {Zj} is Gaussian, 
compare Lemma 4.2 below. 

2.4. Some stationary sequences satisfying Assumption 2.2. Fix a summable 
function /i:Z— »R and an i.i.d. sequence {Wi}'fL_ 00 of mean zero real ran- 
dom variables with moments of all orders. Now convolve: put Z~ = Y^t h(j + 
t)Wi for every j. It is immediate that (1) and (2) hold. To see the summa- 
bility condition (3) on joint cumulants, assume at first that h has finite 
support. Then, by standard properties of joint cumulants (the main point is 
covered by Lemma 4.1 below), we get the formula 

(12) C(Z jo ,. ..,Z jr ) = J2 h(j +£)■■■ h(j r + £)C(Wb, • • • , Wo), 

which leads by a straightforward limit calculation to the analogous formula 
without the assumption of finite support of h, whence in turn verification of 
(3). 

2.5. Structure of the paper. The proofs of Theorems 2.3 and 2.4 require 
a fair number of preliminaries. We provide them in the next few sections. 
In Section 3, we introduce some notation involving set partitions, and prove 
Proposition 3.1, which summarizes the properties of set partitions that we 
need. In spirit, if not in precise details, this section builds on [1]. In Section 
4, we introduce joint cumulants and the Mobius inversion formula relating 
cumulants to moments, and in Section 5 we use the latter to calculate joint 
cumulants of random variables of the form trace Y k by manipulation of set 
partitions; see Proposition 5.2. In Section 6 we carry out some preliminary 
limit calculations in order to identify the dominant terms in the sums rep- 
resenting joint cumulants of random variables of the form trace Y k . Finally, 
the proofs of Theorems 2.3 and 2.4 are completed in Sections 7 and 8, re- 
spectively. 

3. A combinatorial estimate. 

3.1. Set partitions. Given a positive integer k, we define Part(fe) to be 
the family of subsets of the power set 2^ 1 ' , "' k ^ consisting of sets IT such 
that (i) ^ n, (ii) \J A&U A = {l,...,k}, and (in) for all A,B G II, if A ^ B, 
then An B = 0. Elements of Part (A;) are called set partitions of {1, . . . , k}, 
or context permitting simply partitions. Sometimes we call members of a 
partition parts. Given II, E G Part (A;), we say that E refines IT (or is finer 
than LT) if for every AsS there exists some B G II such that Ac. B. Given 
II,S G Part(A;), let II V E G Part (A;) be the least upper bound of LT and E, 
that is, the finest partition refined by both LT and E. We call LT G Part (A;) 
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a perfect matching if every part of II has cardinality 2. Let Part2(fc) be 
the subfamily of Part (A;) consisting of partitions II such that every part has 
cardinality at least 2. The cardinality of a set S is denoted #S, and [x\ 
denotes the greatest integer not exceeding x. 

Proposition 3.1. Let k be a positive integer. Let IIo, IL, II G Part2(2/c) 
be given. Assume that IIo o,nd Hi are perfect matchings. Assume that #IIo V 
IIi V II = 1 . Then we have 



where r = #Ho V IIi . 

The proposition is very close to [1], Lemma 4.10, almost a reformulation. 
But because the setup of [1] is rather different from the present one, the 
effort of translation is roughly equal to the effort of direct proof. We choose 
to give a direct proof in order to keep the paper self-contained. The proof 
will be finished in Section 3.5. In Section 9, we provide some comments 
concerning possible improvements of Proposition 3.1. 

3.2. Graphs. We fix notation and terminology. The reader is encouraged 
to glance at Figure 1 when reading the rest of this section for an illustration 
of the various definitions in a concrete example. 

3.2.1. Basic definitions. For us a graph G = (V,E) is a pair consisting 
of a finite set V and a subset E C2 V of the power set of V such that every 
member of E has cardinality 1 or 2. Elements of V are called vertices and 
elements of E are called edges. A walk w on G is a sequence w = v±V2 ■ ■ ■ v n 
of vertices of G such that {vi,Vi + i} G E for i = l,...,n — 1, and in this 
situation we say that the initial point v\ and terminal point v n of the walk 
are joined by w. A graph is connected if every two vertices are joined by a 
walk. For any connected graph, jfV < 1 + #E. A graph G = (V, E) is called 
a tree if connected and further jfV = 1 + #E. Alternatively, a connected 
graph G = (V, E) is a tree if and only if there exists no edge e G E such that 
the subgraph G' = (V, E \ {e}) gotten by "erasing" the edge e is connected. 

For future reference, we quote without proof the following elementary 
lemma. 

Lemma 3.2 (Parity principle). Let w = v i • • • v n be a walk on a tree T = 
{V,E) beginning and ending at the same vertex, that is, such v\ =v n . Then 
w visits every edge of T an even number of times, that is, 




r>l=> #n V n + #Hi Vll<fc+1- [r/2\ 



#n v n + #rii vn<i + #n<Hi 



#{i e {1, . . . ,n - 1} | {vi, v i+ i} = e} 



is an even number for every e £ E. 
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3.3. Reduction of Ho and H± to standard form. After relabeling the ele- 
ments of {1, ... , 2k}, we may assume that for some positive integers k±, . . . , k r 
summing to k we have 

n vn 1 = {(K a _ u K a ] n Z | a = 1, . . . , r}, 

where K a = 2J2p< a kp for a = 0, . . . , r, and after some further relabeling, 
we may assume that 

n = {{2i-l,2i} \ i = l,...,k}. 

It is well known (and easily checked) that for any perfect matchings So, Si G 
Part2(2/c), the graph ({1, . . . ,2k}, Eg U Si) is a disjoint union of #So V Si 
graphs of the form 

({1,2}, {{1,2}}), ({1,2, 3, 4}, {{1,2}, {2, 3}, {3, 4}, {4,1}}), 

({1,2,3,4,5,6},{{1,2},{2,3},{3,4},{4,5},{5,6},{6,1}}) 

and so on. (The intuition is that the members of So and Si "join hands" 
alternately to form cycles.) Thus, after a final round of relabeling, we may 
assume that 

Hi = u («4t 4 Q) }> u {{*£?, 42U i * = i, .»,*.- 1}), 

a=l 

where ip = K a -\ + v. Note that 

n = U 4?} k = i,---, M, 
n vn 1 = {{^ ) ,...,4£}|a = i,...,r} 

in terms of the notation introduced to describe Hi. 

3.4. Graph-theoretical "coding" ofH. 

3.4.1. Construction of a graph G. For i = 0, 1, let 

^:{l,...,2fc}^^ 

be an onto function such that IT V II is the family of level sets for ipi . Assume 
further that Vq H V\ = 0. We now define a graph G = (V,E) by declaring 
that 

v = v uv 1 , 

E = {{<p (t),<p 1 (i)}\i = l,...,2k}. 
Lemma 3.3. G is connected. 
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Because <pi(j) = (fi(£) for i = 0,1 if j,£ belong to the same part of II, we 
must have #E < #11. Further, #11 < k since IT G Part2(2£;). Thus, using 
Lemma 3.3 in the first inequality, we have 

#n v n + #ni v n = #v < 1 + #e < 1 + #n < k + i, 

which proves inequality (13) of Proposition 3.1. 

Proof of Lemma 3.3. Suppose rather that we have a decomposition 
V = X U Y where X n Y = 0, X ^ 0, 7/0, and no edge of G joins a 
vertex in X to a vertex in Y . Consider the subsets 

J = ^o 1 ^ nxju^^nx), J = y>o x (Vb n Y) u ^(Vi n y ) 

of {1, . . .,2k}. Clearly JU J = {1,.. . ,2fc}, 7/0, and J / 0. We claim that 
I f] J = 0. Suppose rather that there exists i £ I D J. Then we must either 
have y?oW G Vo flX and </?i(i) G Vi Pi Y, or else t^i(z) G V± fl X and </?o(^) £ 
Vo H Y". In either case we have exhibited an edge of G connecting a vertex 
in X to a vertex in Y, which is a contradiction. Therefore I n J = 0. Thus 
the set {I, J} G Part(2fc) is a partition refined by both LTo V LT and LTi V IT, 
which is a contradiction to #IIo V III V LT = 1. Therefore G is connected. □ 



Lemma 3.4. There exist walks 

(a 
J 2k a +1 



(a) (a) («) £ 1 

w \ > = v \ '■■■v y 2k ' fora = l, 



on G such that 



v i — v 2k a +i' 



{^o(4 Q) ),^(^ ) )} 



(a) _ „(a) 



cv • 



= < 



/or a = 1, . . . , r and ^ = 1, . . . , 2k a . 
Proof. We define 

yi^i^)) if v is odd and v < 2k a , 
yo(^ a ^)) if v is even, 
y>i(4 )> \iv = 2k a + l, 
for a = 1, . . . , r and z/ = 1, . . . , 2/c Q + 1. Clearly we have v± = v^} +1 . Re- 
calling that ipo by construction is constant on the set {i\ ,i\ a } £ IIq, we 
see that 

M(4 a) ),^o(4 a) )}=M(4 a) )^o(4 a) )}={4 Q) ,4 a) }- 

By similar considerations one checks the remaining claims of the lemma. We 
omit further details. □ 
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Lemma 3.5. Assume that r > 1. For every A G IIo V IIi there exists an 
index m£ A, a set A' G IIo V IIi distinct from A and an index m' G A' such 
that {(p (m),ipi(m)} = {<po(m') , ^i(m')} • 

In other words, if r > 1, then for every walk w^ a \ there is an edge e of G 
and another walk «/ a > such that both and w^ a ' visit e. 

Proof of Lemma 3.5. Because #n VlTi VLI = 1, given Ae n Vlli, 
there must exist A' G Uq V IIi distinct from A and a set B G II such that 
A n B / and A' C\B^0. Choose me infi and m'ei'fl 5. Because 
the functions <po and y>i are constant on the set B, we are done. □ 

3.5. Completion of the proof of Proposition 3.1. We have seen that Lem- 
ma 3.3 proves inequality (13). We just have to prove inequality (14). Assume 
that r > 1 for the rest of the proof. Consider the graph G = (V, E) as in 
Section 3.4. Let E' C E be such that T = (V, E') is a tree (such a choice 
is possible because G is connected). It will be enough to show that #E' < 
k — r/2. Now we adapt to the present situation a device ("edge-bounding 
tables") introduced in the proof of [1], Lemma 4.10. Let us call a function 
/ : {1, . . . , 2k} — > {0, 1} a good estimator under the following conditions: 

• For all i G {l,...,2k}, if f{i) = 1, then {y»o(t),<Pi(0} G 

• For each e£ E' there exist distinct i,j G {1, . . . , 2fc} such that e = {(^o(i), 

^iW} = {^o(j))^iO')} and /(«) = /(j) = L 

• For each e £ E' and ^4 G IIo V III, if there exists £ € A such that e = 
{ipo(£), (pi(£)}, then there exists £' £ A such that e = {(po(£'),pi(£')} and 

For a good estimator / we automatically have i — By defini- 

tion a good estimator is bounded above by the indicator of the set {i G 
{1, . . . , 2k} | {(po(i), <pi(i)} G E'}, and such an indicator function is an exam- 
ple of a good estimator. Fix now any good estimator /. Suppose that on 
some set A = {i^ , . . . , i^ a } G ITo V ITi the function / is identically equal to 
1. Then the corresponding walk on G is a walk on T, and by the Parity 
Principle (Lemma 3.2) visits every edge of T an even number of times. Select 
m G A as in Lemma 3.5. Let g be the function agreeing with / everywhere 
except that g(m) = 0. Then g is again a good estimator. Continuing in this 
way we can construct a good estimator not identically equal to 1 on any of 
the sets A G IIo Vlli, whence the desired estimate #E < k — r/2. 
Figure 1 illustrates the various objects studied in this section. 
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n v ih 



1 2 ,5 J 



s e t e a io 




5 6 7 S 3 ]n 



I 3 3 J 



(« >)(« » « »)(« >)(« ») n (* ■) C» « )( « * > 



1 23 1 5 6 T »■ 10 



2 3150 7 SB 10 



{. .) (. 


• ■*)(• • ; 


1 2 1 


4 5 6 7 tf 11 10 
c 







t> n, v n 



2 3 4 



a be d 
e 

Fig. 1. Two different partitions II for which k — 5, fci = 2, = 3, such that both are 
associated to the same graph G = (V, E), where V = {a, b, c, d, e}. Note that both partitions 
generate walks eaebe and ebecede on G . 



4. Joint cumulants. 

4.1. Definition. Let X\, . . . ,X}~ be real random variables defined on a 
common probability space with moments of all orders, in which case the char- 
acteristic function ~Eexp(J2j=i itjXj) is an infinitely differentiable function 
of the real variables t\, . . . , tk- One defines the joint cumulant C(Xi, . . . , X^) 
by the formula 

C{X\, . . . ,X k ) = C{X;}f =1 



Qk / k 

logEexp y^itjXj 



dti---dt 



k 



\r- 



-~t k =o 



(The middle expression is a convenient abbreviated notation.) The quantity 
C(Xi, . . . , Xk) depends symmetrically and R- multilinear ly on X\, . . . , X^. 
Moreover, dependence is continuous with respect to the L fc -norm. One has 
in particular 

C(X)=BX, C(X,X)=V&rX, C(X, Y) = Cov(X, Y). 



The following standard properties of joint cumulants will be used. Proofs 
are omitted. 
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Lemma 4.1. // there exists < I < k such that the a -fields a{Xi}f =1 
and a{X{}^ =i+l are independent, then C(A"i, . . . ,Xk) = 0. 

Lemma 4.2. The random vector X±, . . . , X^ has a Gaussian joint distri- 
bution if and only ifC(Xi 1 , Xi r ) = for every integer r > 3 and sequence 
h,...,i r . , k}. 

4.2. Combinatorial description of joint cumulants. As above, let X±, . . . , X^ 
be real random variables defined on a common probability space with mo- 
ments of all orders. Let II € Part(fe) also be given. We define 

Cn(Xi, . . . ,Xk) = Cu{Xi}i =1 = JJ C{Xi} i& A, 

Aen 

EnC^i, • • • ,Xfi) = En{^i}* = i = n^JJ^i- 

Aen ieA 

(The middle expressions are convenient abbreviations.) Note that if X±, . . . , X^ 
are zero mean random variables, then Cn(-Xi, • • • ,X)~) vanishes unless II 6 
Part2(fe). The formula 

(15) EI r -I fc = ]T C u (X 1 ,...,X k ) 

nePart(fc) 

is well-known, and can be verified in a straightforward way by manipulating 
Taylor expansions of characteristic functions. More generally we have the 
following lemma, whose proof can be found in [6], page 290. 

Lemma 4.3. With Xi, . . . ,Xk as above, and for all IT E Part (A;), we have 

(16) En{*i}i=i = E C s {Xi}fc =1 , 

EePart(fc) 
£ refines il 

Cu{Xi}i =l 

(17) = E fnC-lJ^^^C^eEISc^-l)! 

EePart(fc) \Aen 

£ refines il 

We will use the following algebraic fact to compute joint cumulants. For 
a proof see, for example, [7], Example 3.10.4. 
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Lemma 4.4 [Mobius Inversion for the poset Part(fc)]. Let A be an Abelian 
group and let f, g:Part(&;) — > A be functions. Then we have 

(18) ([yE€Part(AO]/(E)= £ g(U)\ 

V nePart(fc) / 

II refines S 

if and only if 
( 

[VII G Part(fc)] g(U) 

V 

\ 

= E ( II (-l)** 6 *"* 01 ^ 1 ^ G S I B C ^} - 1)1] /(E). 
SGPart(fc) VAen / 
S refines II / 

(19) 

In applications below we will simply have A = R. 

5. Cumulant calculations. In the context of matrix models, cumulants 
are useful because they allow one to replace enumeration over arbitrary 
graphs by enumeration over connected graphs. We wish to mimic this idea 
in our context. We first describe the setup, and then perform some compu- 
tations that culminate in Proposition 5.2, which gives an explicit formula 
for joint cumulants of random variables of the form trace Y k . 

5.1. The setup. An (n,k)-word i is by definition a function 

i:{l,...,fc}->{l,...,n}. 

Given II G Part (A;) and an (n, A;)-word i, we say that i is H-measurable if i is 
constant on each set belonging to II. Similarly and more generally, we speak 
of the II-measur ability of any function i: {1, . . . , k} — ► Z. 

Let r be a positive integer. Let k\,...,k r be positive integers and put 

k = k\ H h k r . Let special perfect matchings IIo,IIi G Part(2/c) be defined 

as follows: 

n = {{1, 2}, {3, 4}, . . . , {2k -3,2k- 2}, {2k - 1, 2/c}}, 
IIi = {{2, 3}, . . . , {K u 1}, {Ki + 2, K x + 3}, . . . , {K 2 ,Ki + 1}, 
. . . , {K r _ x + 2, K r _ x + 3}, . . . , {K r ,K r ^ + 1}}, 



A CLT FOR REGULARIZED SAMPLE CO VARIANCE MATRICES 13 

where Ki = 2J2)=i kj for i = 1, . . . , r. (Thus n and IT! are in the standard 
form discussed in Section 3.3 above.) To abbreviate, for any II £ Part(2fe) 
and (p, 2/c)-word j, put 

2k 

B(j) = II B (i(2a - 1), j(2a)), C n (j) = C n (Z j(1) , . . . , Z j(2fc) ). 
a=l 

Note that, on the one hand, -B(j) depends on p even though the notation 
does not show the dependence. Note that, on the other hand, Cn(j) is inde- 
pendent of p. Indeed, Cn(j) remains well defined by the formula above for 
any function j : {1, . . . , 2k} — > Z. 

Concerning the numbers Cn(j) we record for later reference the following 
consequence of the joint cumulant summability hypothesis (3) and the sta- 
tionarity of {Zj}. The proof is immediate from the definitions and therefore 
omitted. Let Z be the subgroup of Z 2fc consisting of functions on {1, ... , 2k} 
constant on each part of II. 

Lemma 5.1. For every j :{1, . . . ,2k} — ► Z, (i) the value of Cn(j) depends 
only on the coset of Z n to which j belongs and moreover (ii) we /iaue 

E |Cn(j)|<oc. 

jez 2fe /z n 

The lemma will be the basis for our limit calculations. 
Our immediate goal is to prove the following result. 

Proposition 5.2. With the previous notation, we have 
C (tr ace Y k \..., trace Y kr ) 
(20) = £ n - fc +# n ° vn £ S(j)C n (j). 

iIePart 2 (2fc) j : (p,2fe)-word s.t. j 

s.t. #noVlIiVn=l is Ili-measurable 

Proof. The proof involves an application of the Mobius inversion for- 
mula (Lemma 4.4). Recall that 

trace Y k = £ Y(l(l), 1(2))Y (1(2), 1(3)) • • • Y(l(k), 1(1)). 

1 : (n,fc)-word 

Further, Y(j 1 ,j 2 ) = B(j 1 ,j 2 ) £» X(i, ji)X(i,j 2 ) and hence 

2k 

traced = £ B(j) ]J X(i(a)J(a)), 

i : (n,2fc)-word s.t. a=l 
i(2t-l)=i(2t),t=l,...,fc, and 
j : (n,2fc)-word s.t. 
j(2t)=j(2*+l),t=l,...,fc 
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where j(2k + 1) is defined by the "wrap-around rule": j(2fc + 1) =j(l). Hence, 
we have 

E(traceY fcl )--- (trace r fcr ) 

2k 

(21) E E BQ)B n ^(i(«)J(«)) 

i: (n,2fc)-word s.t. ij: (p,2fc)-word s.t. j ct=l 
is Ilo-measurable isili-measurablc 

Using the representation of moments in terms of cumulants [see (15)] we get 
E (tr ace Y fcl ) •••(trace Y fcr -) 

E E m 

i: (n,2fc)-word s.t. i j : (p,2fc)-word s.t. j 
is Ilo-measurable is ili-measurablc 

x £ C u {X(i(a),i(a))}f =1 

^22) nePart(2fe) 

E E n ~ kB U) E Cn(j) 

i: (n,2fc)-word s.t. ij: (p,2fc)-word s.t. j n*€Part2(2/c) 
is ilo-measurable is ili-measurablc s.t. i is El-measurable 

= £ n - fc +# n ° vn e mcn(j), 

nePart 2 (2fc) j : (p,2fc)-word 

s.t. j is Ili-measurablc 

where in the next to last equality we used that cumulants of independent 
variables vanish in order to restrict the summation to words i that are II- 
measurable. 

We next define an embedding of Part(r) in Part (2 A;). It will be convenient 
to use it, a to denote elements of Part(r) and II, S to denote elements of 
Part(2/c). (Also we use upper case Roman letters for subsets of {1, . . . ,2k} 
and lower case Roman letters for subsets of {1, . . . ,r}.) Put 

A 1 = {l,...,K 1 },...,A r = {K r _ 1 + l,...,K r }, 

so that 

U VU 1 = {A 1 ,...,A r }. 

Given oC{l,...,r}, let a* = [j i£a Ai, and given a £ Part(r), let 

T(a) = {a* | a G a} G Part(2&). 

Via T the poset Part(r) maps isomorphically to the subposet of Part (2 A;) 
consisting of partitions refined by Hq V IIi . 
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We are ready to apply the Mobius inversion formula (Lemma 4.4). Con- 
sider the real- valued functions / and g on Part(r) defined as follows: 

(23) g^) = Y, n- fc+#n ° Vn £ ^(j)Cn(j) 

IIePart(2fc) j : (p,2fc)-word 

noViliVn=T(7r) s.t. j is Ili-measurablc 

and 

(24) f(a)= £ g(it). 

-n-gPart(r) 
7r refines a 

Now 7r refines cr if and only if T(ir) refines T(o~). Therefore we have 

(25) f(a) = n- fc +# n ° vn £ S(j)C n (j). 

nePart(2fc) j : (p,2fc)-word 

iloVlIiVn refines T(a) s.t. j is ili-measurable 

Using (24) and applying Lemma 4.4, it follows that for any it G Part(r), 

(26) g(7r)= Y (n(- 1 ) #{6e<T|6CO}-1 (#{feG^I&Ca}-l)! N j/( ( 7). 

o-GPart(r) \aen / 
a refines tt 

An evident modification of the calculation (22) above gives for every a G 
Part(r) that (trace Y kl , . . . , trace Y kr ) equals the right-hand side of (25), 
and therefore equals /(cr). Thus, (26), when compared with (17), shows that 

g{{{l, . . . ,r}}) = C(tracey fcl , . . . , trace Y kr ) , 

which is exactly what we wanted to prove. □ 

6. Limit calculations. We continue in the setting of Proposition 5.2. We 
find the order of magnitude of the subsum of the right-hand side of (20) 
indexed by II and compute limits as p — > oo in certain cases. 

Proposition 6.1. Fix lie Part 2 (2fc) such that #n V rii V LT = 1. We 

have 

(27) ]T S(j)Cn(j) = 0^oo(p6" 1+#niVn ) 

j : (p,2fc)-word s.t. j 
is Ili-measurablc 

where the implied constant depends only on Hq, Hi and H. 

Before commencing the proof of the proposition, we record an elementary 
lemma which expresses in algebraic terms the fact that a tree is connected 
and simply connected. We omit the proof. We remark that a tree can have 
no edges joining a vertex to itself. 
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Lemma 6.2. Let T = (V,E) be a tree with vertex set V C {!,... ,2k}. 
For each function j : V — > Z define 5y.E — > Z by the rule 

Sj(K/?})=j(/3)-j(a) 

for all a,(3 £V such that a < [3 and {a,f3} G E. Then: (i) 5j = implies 
that j is constant, (ii) For every k : F — ► Z i/iere exists j : V — > Z unique up 
to addition of a constant such that 5j = k. 

We will refer to (5 as the increment operator associated to the tree T. 

Proof of Proposition 6.1. We begin by constructing a tree T to 
which Lemma 6.2 will be applied. Let E 2 be the set consisting of all two- 
element subsets of parts of II. With 

V = {l,...,2k}, 

consider the graphs 

Goi2 = (^n un 1 u^ 2 ), Gi 2 = (y,n 1 u J E 2 ), g 2 = (v,e 2 ). 

By hypothesis the graph G012 is connected, and further, the number of 
connected components of G12 (resp., G 2 ) equals #IIi VII (resp., #11). Now 
choose E 2 C E 2 so that T 2 = (V,E 2 ) is a spanning forest in G 2 , that is, a 
subgraph with the same vertices but the smallest number of edges possible 
consistent with having the same number of connected components. Then 
choose E\ C IIi such that T\ 2 = (V,E\ U E 2 ) is a spanning forest in G\ 2 , 
and finally choose Eq C IIo such that Tq\ 2 = (V, Eq U E\ U E 2 ) is a spanning 
tree in Gq\ 2 . By construction, the sets Ei, i = 0, 1,2, are disjoint. Note that 
Lemma 6.2 applies not only to T012, but also to the connected components 
of T\ 2 and T 2 . Note that 

(28) #£ = -i + #iiiVii 

by construction. Hereafter we write simply T = Tq\ 2 . 

The bound in (27) will be obtained by relaxing some of the constraints 
concerning the collection of words j over which the summation runs. We 
will work with the increment operator 5 associated to T by Lemma 6.2. 
For i = 0,1,2 let Si be the Abelian group (independent of p) consisting of 
functions j : V — ► Z such that: 

• j(i) = o, 

• 5j is supported on the set Ei. 
Also let 

S- 1 = {y.V^Z\5j = 0} = {y.V^Z\y. constant}, 



A CLT FOR REGULARIZED SAMPLE CO VARIANCE MATRICES 17 

which is independent of p. Recall that for any partition II, Z n is the subgroup 
of J? k consisting of functions on {1, . . . , 2k} constant on each part of II. By 
Lemma 6.2 applied to T and also to the connected components of T12 and 
T2, we have 

z 2fc = s_i©s eSie.s 2 , 

(29) Z n = S- 1 ®S ®S 1 , 

z n lV n = 5 _ i 5q _ 

Let sj° C S _i © So be the subset (depending on p) consisting of functions 
j : V — > Z such that: 

. j(l)€{l,...,p}, 

• |<5j(e)| <6foralleG£ - 

Now if j is a Ili-measurable (p, 2/c)-word such that B(j) does not vanish, 
then the following hold: 

• j(l)G{l,...,p}, 

(30) • |5j(e)| < b fore E E (because E C Do), 

• <5j (e) = for e G -Ei (because E\ C III) . 

By (29) it follows that a LTi-measurable (p, 2fc)-word j such that B(j) does 

not vanish has a unique decomposition j = jo + J2 with jo G «Sq and J2 G 52, 
and moreover we necessarily have 

(31) Cn(j) = Cn(j 2 ) 

by Lemma 5.1(i) and the LT-measurability of jo- 
We now come to the end of the proof. We have 

E \B(3)C u m<#si p) J2\Cnm 

j : (p,2fc)-word jG^ 

s.t. j is 
Ili-measurable 

<p(26 + l)- 1 +# n ^ n E|Cn(j)| 

at the first inequality by (29), (31) and at the second inequality by the 
evident estimate for #5q P ^ based on (28). Finally, finiteness of the sum over 
S*2 follows from (29) and Lemma 5.1(h). □ 

We note in passing that in the proof of Proposition 6.1, we over-estimated 
the left-hand side of (27) by requiring in (30) that |<5j(e)| < b only for e G Eq, 
rather than for all e G IIq. 
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Proposition 6.3. We continue under the hypotheses of the preceding 
proposition, and now make the further assumption that #IIi V II = 1. Then: 
we have 

(32) ]T (l-B(j))C n (j) = o^oo(p). 

j : (p,2fc)-word s.t. j 
is Ili-mcasurablc 

Proof. We continue in the graph-theoretical setup of the proof of the 

preceding proposition. But now, under our additional hypothesis that JfHi V 

(p) 

II = 1, the set Eq is empty, and hence the set Sq is now simply the set of 
constant functions on {1, ... , 2k] taking values in the set {1, ... ,p}. Fix e > 
arbitrarily and then choose a finite set F C S2 such that J2ies 2 \F |Cn(j)| < e. 
Let 

N = max{|j(a) - j(/3)| | a,p € {1, . . . , 2k}, j G F}. 

Let j be a Ili-measurable (p, 2A;)-word and write j = jo + h with jo a constant 
function with values in {1, . . . ,p} and j2 € 82- If J2 G F then, provided p is 
large enough to guarantee that b > N, we automatically have -B(j) = 1. Thus 
the sum in question is bounded in absolute value by ep for p S> 0. Since e is 
arbitrary, the proposition is proved. □ 

The proof of the following proposition is immediate from the definitions 
and therefore omitted. 



Proposition 6.4. Under exactly the same hypotheses as the preceding 
proposition we have 

(33) lim- J2 C n (j)= £ C n (j). 

* j : (p,2fc)-word s.t. j jGZ n i /Z n i vn 

is ili-measurable 

Lemma 5.1 guarantees that the sum on the right is well defined. 

7. Proof of the law of large numbers. This section is devoted to the 
proof of Theorem 2.3. The main point of the proof is summarized by the 
following result. 

Proposition 7.1. Let Assumptions 2.1 and 2.2 hold. Let Y =y(p) be 
as in (4). Let i?Q be as in (8). Then we have 

(34) lim p _1 E trace Y k = 

p^oo u 

for every integer k > 0. 
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From the case r = 2 of Proposition 8.1, which is proved in the next section, 
it follows that 

(35) lim Var [ - trace Y k ) = 

for all integers k > 0. Arguing just as at the end of the proof of [1], Theorem 
3.2, one can then deduce Theorem 2.3 from equations (7), (34) and (35). We 
omit those details. Thus, to finish the proof of Theorem 2.3, we just have 
to prove Proposition 7.1. (There will be no circularity of reasoning since the 
proof of Proposition 8.1 does not use Theorem 2.3.) 



Proof of Proposition 7.1. Back in the setting of Proposition 5.2 
with r = 1 (in which case, #IIo V LTi = 1), we have 

iEtracey fe = £ p ^ n - k+ *^ n £ B(j)Cn(j). 

P nePart 2 (2fe) j : (p,2fc)-word s.t. j 

is Ili-measurablc 

For fixed II S Part2(2fc) the contribution to the total sum is 
q ^ n -i-fe+#n vn+#n 1 vn 

by Proposition 6.1. Thus, in view of Proposition 3.1, specifically estimate 
(13), in order to evaluate the limit in question, we can throw away all terms, 
save those associated to II = Ho- We therefore have 

(36) lim -EtraceY fc = V Cn (j) 

1 jezni/znovnj 

by Propositions 6.3 and 6.4. Recalling that R(J — i) = C(Zi,Zj), and writing 

3 = {juh,j2,---,jkJk,jl) 

we have 

Cu (j) = R(j2 ~ h) ■ ■ ■ R {jk ~ jk-i)R(h ~ jk), 

and hence 

E c n (j)= E R(h-h)---R{jk-3k~i)R(h-3k) = R ( i ) 

jeZni/znoviL j 3 ,...J k eZ 

for any fixed j\ G Z. The proof of (34) is complete. □ 
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8. Proof of the central limit theorem. This section is devoted to the 
proof of Theorem 2.4. The main point of the proof is summarized by the 
following proposition. 

Proposition 8.1. Let Assumptions 2.1 and 2.2 hold. Let Y =Y&> be 
as in (4). Let Qj,- and be as in (8). Then for each integer r>2, and 

all positive integers ki, . ■ . , k r , we have 

/ n \r/2 

lim - C (trace Y kl trace Y kr ) 

0, ifr>2, 

hk 2 (2< 1+fc2) + R^QijR? 2 '^) > ifr = 2. 

i,3 



In view of Lemma 4.2, in order to finish the proof of Theorem 2.4 by the 
method of moments, we just have to prove Proposition 8.1. 

PROOF of Proposition 8.1. Back in the setting of Proposition 5.2, 
this time assuming r > 2, we have 



P 



r/2 



C (trace Y kl ,..., trace Y kr ) 



£ p-r/v/a-fc+tfuovn 53 men®, 

ilGPart 2 (2fc) j: (p.2fc)-word s.t. j 

s.t. #iIoViIiVn=l is ill -measurable 

(37) 

and for fixed II the contribution to the total sum is 

/ , , /h\ -i+#nivn\ 

0(p 1 ~ r/2 n r/2 " fc ~ 1+#n ° vn+#nivn ( 

by Proposition 6.1. In view of Proposition 3.1, specifically estimate (14), we 
are already done in the case r > 2. 

For the rest of the proof assume r = 2. By the estimate immediately 
above many terms can be dropped from the right-hand side of the sum (37) 
without changing the limit as p — > oo. The terms remaining can be analyzed 
by means of Propositions 3.1, 6.3 and 6.4. We thus obtain the formula 

(38) lim -C (trace Y kl , trace Y k2 ) = J ^( n ) 

P °° P nePart 2 (2fc) 

s.t. #nivn=i 

and #n vn=fc-l 

where 

K(U)= C n(j). 

jez n i/z n i vn 
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It remains only to classify the IPs appearing on the right-hand side of (38) 
and for each to evaluate K(U). 

We turn to the classification of II appearing on the right-hand side of 

(38) . Recall that in the setup of Proposition 5.2 with r = 2, we have 

n = {{l,2},...,{2A;-l,2fc}}, 

n x = {{2, 3}, . . . , {2k u 1}, {2h + 2, 2h +3},..., {2k, 2k x + 1}}. 
The conditions 

#n vn=fc-i, #riivn = i 

dictate that we must have 

(Ho vn)\n = {Au A'}, n \ (n v n) = {A, a'} 

for some A, A' £ LIo with 

Ac{l,..., 2fa}, A' C {2&i + 1, . . . , 2k}. 

There are exactly k\k2 ways of choosing such A and A' , and for each such 
choice, there are exactly three possibilities for II, two of which are perfect 
matchings and one which has all parts of size 2 except for one part of size 
4. That is, either 

(39) 11= (n \ {A, A'}) U {{mm A, mm A'}, {max A, max A'}} 
or 

(40) 11= (n \ {A, A'}) U {{min^maxA'lJmax^min^'}} 
or 

(41) U=(n \{A,A'})U{AuA'}. 

Thus we have enumerated all possible Li's appearing on the right-hand side 
of formula (38). We remark that Figure 1 depicts examples of II falling into 
patterns (41), (39), respectively. 

We turn to the evaluation of if (II) in the cases (39), (40). In these cases, 
simply because #11 V LIi = 1 and II is a perfect matching, it is possible to 
choose a permutation a of {1, . . . , 2k} such that 

U 1 = {{a(2),a(3)},...,{a(2k),a(l)}}, 

n = {{(7(1), a(2)}, . . . , {a(2k - l),<r(2fc)}}, 

and so we find in these cases that 

(42) K(n) = R ( k) 

by a repetition of the calculation done at the end of the proof of Proposi- 
tion 7.1. 
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We turn finally to the evaluation of K(U) in the case (41). In this case 
there is enough symmetry to guarantee that -f^(II) does not depend on A 
and A'. We may therefore assume without loss of generality that 

A = {2k 1 -l,2ki}, A' = {2fci + l,2fci + 2} 

in order to evaluate K(H). To compress notation we write 

f — C*(7 7 7 7 \ p( m ) _ p( m ) p _ 

Assume temporarily that ki,k2 > 1. Since = C(Zi, Zj) we then have for 
any fixed ji £ Z that 

= ^ ^iiia ' ' " Rjk 1 -ijk 1 Cjk 1 jij kl+1 j kl+2 Rjk 1 +2jk 1 +3 ' ' ' R-ikikx+i 



and hence after summing over "interior" indices we have 

(43) k(u)= £ < ; /y; /u ,, ;; /^ : I: '-W'. 



One can then easily check by separate arguments that (43) remains valid 
when k\ or &2 or both take the value 1. 

Together (38)-(43) complete the proof. □ 



9. Concluding comments. 

1 . We have presented a combinatorial approach to the study of limits for 
the spectrum of regularized covariance matrices. We have chosen to present 
the technique in the simplest possible setting, that is, the stationary setup 
with good a-priori estimates on the moments of the individual entries. Some 
directions for generalization of this setup are to allow nonstationary se- 
quences with covariances, as in [5], or to allow for perturbations of the 
stationary setup, as in [3], or to relax the moment conditions of Assump- 
tion 2.2. In these more general situations, especially in the context of the 
LLN, the techniques we have presented here, plus standard approximation 
techniques, are likely to yield results. But to keep focused we do not study 
these here. 

2. A natural question is whether our approach applies also to the study 
of centered empirical covariances, that is matrices Y = Y^ with entries 

Y(i,j)=B ij (X-X) T (X-X) ij , 

where Xij = n~ l J2k=i -^kj = n~ 3 / 2 J2k=i ■ ^° a limited extent it does, as 
we now explain. Note that with (A^)y = nB^XuX^ = B^\n~ l ££ =1 z\ k) ) x 
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(n 1 J2k=i %j) we have Y = Y — A^ p \ In contrast to the non-banded ver- 



j 



sion, the perturbation A = A^ p ^ is not of rank one. However, for some de- 
terministic constants Ci, C2, 



1:1= 

In particular, 



\ 



£ A(i,j)0=CM,/i?/n), 



where for a sequence of positive random variables W p and deterministic 
positive sequence g p we say that W p = Op(g p ) if W p /g p is a tight sequence 
as p — >oo. Letting Ai < A2 < • • • < A p denote the (ordered) collection of 
eigenvalues of Y ^ and Ai < A2 < • • • < A p that of Y ^ , we conclude that 
Yli=i l^i — Aj| 2 = Op(bp/n 2 ). Let Lip x denote the collection of deterministic 
Lipschitz functions on R bounded by 1 and with Lipschitz constant 1. Since 



sup J2\f( x i)-f&)\<v^> 

P /6LiPi i=l \ 



J2\\-^i\ 2 = Op(Jbp/n) 



i=i 



it follows from Theorem 2.3 that the conclusion of that theorem remain true 
if we substitute L, the empirical measure of the eigenvalues of Y^ p \ for L. 
But this line of reasoning is not powerful enough to yield the conclusion of 
Theorem 2.4 with Y replacing Y, unless one imposes additional conditions 
that, in particular, imply bp/n -^ p ^,oo 0. Thus it appears that our basic 
method itself requires some significant modification to yield a CLT in the 
regime of Assumption 2.1 in the centered case. The problem is open. 

3. We emphasize that unlike the results in [3], we do not deal at all with 
the distance (in operator norm, or otherwise) between the banded empirical 
covariance matrix Y, and the covariance matrix of the process {Zj}. 

4. A natural question arising from the central limit theorem (Theorem 
2.4) is whether one can obtain an approximation for E trace Y k with Op^oo(l) 
error. We recall that in the context of classical Wishart matrices, compact 
formulas for these quantities can be written down; see [1] and references 
therein. A similar attempt to provide such formulas here seems to run into 
many subcases, depending on the relations between the parameters p,n,b, 
and on the convergence rate in the summability condition. We were un- 
successful in finding compactly expressible results. We thus omit this topic 
entirely. 
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5. We finally mention a combinatorial question arising from Proposi- 
tion 3.1. In the setting of that proposition, it can be shown that for perfect 
matchings II the estimate 

(44) #n v n + #ni vn<fc + 2- r 

holds and is sharp. But (44) is too strong to hold in general, as is shown by 
the example 

n = {{1,2}, {3,4}, {5, 6}, {7, 8}, {9, 10}, {11, 12}}, 
n 1 = {{2,3}, {1,4}, {5, 6}, {7, 8}, {9, 10}, {11, 12}}, 
n = {{1, 5, 6}, {2, 7, 8}, {3, 9, 10}, {4, 11, 12}} 

for which 

#n vn = #iii vn = 2, k = 6,r = 5, 

and the same example leaves open the possibility that (14) is too weak. How 
then can one sharpen (14)? The problem is open. 
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